22 research outputs found
Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
Virtual assistants such as Google Assistant, Alexa and Siri provide a
conversational interface to a large number of services and APIs spanning
multiple domains. Such systems need to support an ever-increasing number of
services with possibly overlapping functionality. Furthermore, some of these
services have little to no training data available. Existing public datasets
for task-oriented dialogue do not sufficiently capture these challenges since
they cover few domains and assume a single static ontology per domain. In this
work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing
over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds
the existing task-oriented dialogue corpora in scale, while also highlighting
the challenges associated with building large-scale virtual assistants. It
provides a challenging testbed for a number of tasks including language
understanding, slot filling, dialogue state tracking and response generation.
Along the same lines, we present a schema-guided paradigm for task-oriented
dialogue, in which predictions are made over a dynamic set of intents and
slots, provided as input, using their natural language descriptions. This
allows a single dialogue system to easily support a large number of services
and facilitates simple integration of new services without requiring additional
training data. Building upon the proposed paradigm, we release a model for
dialogue state tracking capable of zero-shot generalization to new APIs, while
remaining competitive in the regular setting.Comment: To appear at AAAI 202
Show, Don't Tell: Demonstrations Outperform Descriptions for Schema-Guided Task-Oriented Dialogue
Building universal dialogue systems that can seamlessly operate across
multiple domains/APIs and generalize to new ones with minimal supervision and
maintenance is a critical challenge. Recent works have leveraged natural
language descriptions for schema elements to enable such systems; however,
descriptions can only indirectly convey schema semantics. In this work, we
propose Show, Don't Tell, a prompt format for seq2seq modeling which uses a
short labeled example dialogue to show the semantics of schema elements rather
than tell the model via descriptions. While requiring similar effort from
service developers, we show that using short examples as schema representations
with large language models results in stronger performance and better
generalization on two popular dialogue state tracking benchmarks: the
Schema-Guided Dialogue dataset and the MultiWoZ leave-one-out benchmark.Comment: To appear at NAACL 202
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Reinforcement learning from human feedback (RLHF) is effective at aligning
large language models (LLMs) to human preferences, but gathering high quality
human preference labels is a key bottleneck. We conduct a head-to-head
comparison of RLHF vs. RL from AI Feedback (RLAIF) - a technique where
preferences are labeled by an off-the-shelf LLM in lieu of humans, and we find
that they result in similar improvements. On the task of summarization, human
evaluators prefer generations from both RLAIF and RLHF over a baseline
supervised fine-tuned model in ~70% of cases. Furthermore, when asked to rate
RLAIF vs. RLHF summaries, humans prefer both at equal rates. These results
suggest that RLAIF can yield human-level performance, offering a potential
solution to the scalability limitations of RLHF